pacman::p_load(sf, sfdep,
spatstat,
tmap, leaflet,
raster,
spNetwork,
tidyverse,
DT, knitr, kableExtra, gtsummary)1 Thailand’s Killer Roads
Road traffic injuries pose a significant public health issue in Thailand, with a high number of fatalities, injuries, and disabilities each year. These incidents have a profound impact not only on the victims and their families but also on society and the nation as a whole. According to the World Health Organization (WHO), approximately 20,000 people lose their lives in road accidents annually, equating to about 56 deaths each day.
Despite numerous government initiatives to reduce road casualties, the situation has seen minimal improvement. These issues underscore the need for a deeper understanding of road traffic accidents, which can be largely attributed to two primary factors: behavioral and environmental.
Behavioral factors, such as driver behavior and performance, are significant contributors to traffic accidents. These factors include driving style and skills, as well as direct and indirect driving behaviors. Environmental factors encompass conditions such as poor visibility during adverse weather (e.g., heavy rain or fog) and challenging road features (e.g., sharp bends, slippery slopes, and blind spots).
Previous studies have highlighted the value of Spatial Point Patterns Analysis in examining road traffic accidents. However, these studies often focus solely on either behavioral or environmental factors, neglecting the influence of temporal factors such as season, day of the week, or time of day.
In light of this, it is essential to investigate the factors affecting road traffic accidents in the Bangkok Metropolitan Region (BMR) using both spatial and spatio-temporal point patterns analysis methods. This approach aims to identify the leading causes of accidents and provide insights that can guide the development of effective policies and interventions to enhance road safety.
2 Getting Started
2.1 Loading Packages
In this exercise, we will be using the following packages:
| Package | Description |
|---|---|
| sf | For importing, managing, and handling geospatial data |
| spatstat | For point pattern analysis. In this hands-on exercise, it will be used to perform 1st- and 2nd-order spatial point patterns analysis and derive kernel density estimation layer |
| sfdep | Used to compute spatial weights |
| tmap | For thematic mapping |
| leaflet | For interactive maps |
| raster | For raster data manipulation functions |
| spNetwork | For spatial analysis on networks |
| tidyverse | For non-spatial data wrangling |
| DT, knitr, kableExtra | For building tables |
| gtsummary | For publication-ready analyticla and summary tables |
The following code chunk uses p_load() of pacman package to check if the aforementioned packages are installed in the computer. If they are, the libraries will be called into R.
2.2 The Data
| File | Source | Screenshot |
|---|---|---|
Thailand - SubnationalAdministrativeBoundaries, in SHP file format There are 3 administrative levels in the shapefile: 0 (country), 1 (province), 2 (district), and 3 (sub-district, tambon) boundaries. We will use level 1 to filter for the Bangkok Metropolitan Region. |
Humanitarian Data Exchange | |
| Thailand Roads (OpenStreetMap Export), in SHP file format | Humanitarian Data Exchange | |
Thailand Road Accident [2019-2022], in CSV format This dataset provides a comprehensive statistics on recorded road accidents in Thailand from 2019 to 2022, including time location, accident type, weather condition, and road characteristics. |
Kaggle |
2.2.1 Traffic Accident Data
read_csv() of the readr package allows us to import csv files into R Studio.
accidents <- read_csv("data/geospatial/thai_road_accident_2019_2022.csv")glimpse(accidents)Rows: 81,735
Columns: 18
$ acc_code <dbl> 571905, 3790870, 599075, 571924, 599523, 5…
$ incident_datetime <dttm> 2019-01-01 00:00:00, 2019-01-01 00:03:00,…
$ report_datetime <dttm> 2019-01-02 06:11:00, 2020-02-20 13:48:00,…
$ province_th <chr> "ลพบุรี", "อุบลราชธานี", "ประจวบคีรีขันธ์", "เชียงใ…
$ province_en <chr> "Loburi", "Ubon Ratchathani", "Prachuap Kh…
$ agency <chr> "department of rural roads", "department o…
$ route <chr> "แยกทางหลวงหมายเลข 21 (กม.ที่ 31+000) - บ้านวั…
$ vehicle_type <chr> "motorcycle", "private/passenger car", "mo…
$ presumed_cause <chr> "driving under the influence of alcohol", …
$ accident_type <chr> "other", "rollover/fallen on straight road…
$ number_of_vehicles_involved <dbl> 1, 1, 2, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, …
$ number_of_fatalities <dbl> 0, 0, 1, 0, 0, 0, 0, 1, 3, 0, 0, 1, 0, 0, …
$ number_of_injuries <dbl> 2, 2, 0, 1, 0, 2, 2, 0, 0, 1, 1, 0, 1, 1, …
$ weather_condition <chr> "clear", "clear", "clear", "clear", "clear…
$ latitude <dbl> 14.959105, 15.210738, 12.374259, 18.601721…
$ longitude <dbl> 100.87346, 104.86269, 99.90795, 98.80420, …
$ road_description <chr> "straight road", "straight road", "wide cu…
$ slope_description <chr> "no slope", "no slope", "slope area", "no …
A quick look into the data using glimpse() of the dplyr package reveals that there are 18 variables in the data, including:
| Column Name | Description |
|---|---|
| acc_code | The accident code or identifier. |
| incident_datetime | The date and time of the accident occurrence. |
| report_datetime | The date and time when the accident was reported. |
| province_th | The name of the province in Thailand, written in Thai. |
| province_en | The name of the province in Thailand, written in English. |
| agency | The government agency responsible for the road and traffic management. |
| route | The route or road segment where the accident occurred. |
| vehicle_type | The type of vehicle involved in the accident. |
| presumed_cause | The presumed cause or reason for the accident. |
| accident_type | The type of nature of the accident. |
| number_of_vehicles_involved | The number of vehicles involved in the accident. |
| number_of_fatalities | The number of fatalities resulting from the accident. |
| number_of_injuries | The number of injuries resulting from the accident. |
| weather_condition | The weather condition at the time of the accident. |
| latitude | The latitude coordinate of the accident location. |
| longitude | The longitude coordinate of the accident location. |
| road_description | The description of the road type or configuration where the accident occurred. |
| slope_description | The description of the slope condition at the accident location. |
Since the province_th and route columns are in Thai, I will remove them. The province column is already available in English, and the route information can be projected using the latitude and longitude columns. Additionally, I will remove the report_datetime column and use incident_datetime for our analysis.
accidents <- accidents %>%
select(-c(province_th,
route,
report_datetime))The gtsummary package provides us with descriptive statistics of the dataset and also includes the amount of missingness in each variable.
| Characteristic | N | N = 81,7351 |
|---|---|---|
| acc_code | 81,735 | 3,824,084 (3,789,460, 5,831,089) |
| incident_datetime | 81,735 | 2019-01-01 to 2022-12-31 23:55:00 |
| province_en | 81,735 | |
| Amnat Charoen | 196 (0.2%) | |
| Ang Thong | 1,125 (1.4%) | |
| Bangkok | 6,439 (7.9%) | |
| buogkan | 583 (0.7%) | |
| Buri Ram | 327 (0.4%) | |
| Chachoengsao | 1,407 (1.7%) | |
| Chai Nat | 692 (0.8%) | |
| Chaiyaphum | 471 (0.6%) | |
| Chanthaburi | 1,360 (1.7%) | |
| Chiang Mai | 3,407 (4.2%) | |
| Chiang Rai | 858 (1.0%) | |
| Chon Buri | 4,120 (5.0%) | |
| Chumphon | 1,039 (1.3%) | |
| Kalasin | 659 (0.8%) | |
| Kamphaeng Phet | 822 (1.0%) | |
| Kanchanaburi | 1,165 (1.4%) | |
| Khon Kaen | 1,224 (1.5%) | |
| Krabi | 741 (0.9%) | |
| Lampang | 1,018 (1.2%) | |
| Lamphun | 744 (0.9%) | |
| Loburi | 1,036 (1.3%) | |
| Loei | 742 (0.9%) | |
| Mae Hong Son | 308 (0.4%) | |
| Maha Sarakham | 832 (1.0%) | |
| Mukdahan | 668 (0.8%) | |
| Nakhon Nayok | 386 (0.5%) | |
| Nakhon Pathom | 891 (1.1%) | |
| Nakhon Phanom | 462 (0.6%) | |
| Nakhon Ratchasima | 3,330 (4.1%) | |
| Nakhon Sawan | 1,403 (1.7%) | |
| Nakhon Si Thammarat | 1,363 (1.7%) | |
| Nan | 931 (1.1%) | |
| Narathiwat | 477 (0.6%) | |
| Nong Bua Lam Phu | 288 (0.4%) | |
| Nong Khai | 297 (0.4%) | |
| Nonthaburi | 827 (1.0%) | |
| Pathum Thani | 1,923 (2.4%) | |
| Pattani | 395 (0.5%) | |
| Phangnga | 328 (0.4%) | |
| Phatthalung | 1,068 (1.3%) | |
| Phayao | 543 (0.7%) | |
| Phetchabun | 1,704 (2.1%) | |
| Phetchaburi | 873 (1.1%) | |
| Phichit | 226 (0.3%) | |
| Phitsanulok | 800 (1.0%) | |
| Phra Nakhon Si Ayutthaya | 1,571 (1.9%) | |
| Phrae | 1,466 (1.8%) | |
| Phuket | 444 (0.5%) | |
| Prachin Buri | 809 (1.0%) | |
| Prachuap Khiri Khan | 1,141 (1.4%) | |
| Ranong | 140 (0.2%) | |
| Ratchaburi | 497 (0.6%) | |
| Rayong | 770 (0.9%) | |
| Roi Et | 721 (0.9%) | |
| Sa Kaeo | 699 (0.9%) | |
| Sakon Nakhon | 973 (1.2%) | |
| Samut Prakan | 2,241 (2.7%) | |
| Samut Sakhon | 1,015 (1.2%) | |
| Samut Songkhram | 524 (0.6%) | |
| Saraburi | 1,158 (1.4%) | |
| Satun | 321 (0.4%) | |
| Si Sa Ket | 739 (0.9%) | |
| Sing Buri | 448 (0.5%) | |
| Songkhla | 1,231 (1.5%) | |
| Sukhothai | 1,305 (1.6%) | |
| Suphan Buri | 3,056 (3.7%) | |
| Surat Thani | 1,983 (2.4%) | |
| Surin | 843 (1.0%) | |
| Tak | 1,859 (2.3%) | |
| Trang | 784 (1.0%) | |
| Trat | 469 (0.6%) | |
| Ubon Ratchathani | 751 (0.9%) | |
| Udon Thani | 946 (1.2%) | |
| unknown | 34 (<0.1%) | |
| Uthai Thani | 503 (0.6%) | |
| Uttaradit | 949 (1.2%) | |
| Yala | 343 (0.4%) | |
| Yasothon | 504 (0.6%) | |
| agency | 81,735 | |
| department of highways | 75,304 (92%) | |
| department of rural roads | 5,115 (6.3%) | |
| expressway authority of thailand | 1,316 (1.6%) | |
| vehicle_type | 81,735 | |
| 4-wheel pickup truck | 28,445 (35%) | |
| 6-wheel truck | 3,019 (3.7%) | |
| 7-10-wheel truck | 2,364 (2.9%) | |
| bicycle | 257 (0.3%) | |
| large passenger vehicle | 421 (0.5%) | |
| large truck with trailer | 6,257 (7.7%) | |
| motorcycle | 14,874 (18%) | |
| motorized tricycle | 294 (0.4%) | |
| other | 3,513 (4.3%) | |
| passenger pickup truck | 325 (0.4%) | |
| pedestrian | 219 (0.3%) | |
| private/passenger car | 20,814 (25%) | |
| three-wheeled vehicle | 28 (<0.1%) | |
| tractor/agricultural vehicle | 63 (<0.1%) | |
| van | 842 (1.0%) | |
| presumed_cause | 81,735 | |
| abrupt lane change | 134 (0.2%) | |
| aggressive driving/overtaking | 12 (<0.1%) | |
| brake/anti-lock brake system failure | 55 (<0.1%) | |
| cutting in closely by people/vehicles/animals | 6,724 (8.2%) | |
| dangerous curve | 61 (<0.1%) | |
| debris/obstruction on the road | 294 (0.4%) | |
| disabled vehicle without proper signals | 37 (<0.1%) | |
| disabled vehicle without proper signals/signs | 5 (<0.1%) | |
| driving in the wrong lane | 37 (<0.1%) | |
| driving under the influence of alcohol | 1,464 (1.8%) | |
| driving without headlights/illumination | 20 (<0.1%) | |
| external disturbance | 2 (<0.1%) | |
| failure to signal enter/exit parking | 43 (<0.1%) | |
| failure to yield right of way | 133 (0.2%) | |
| failure to yield/signal | 215 (0.3%) | |
| falling asleep | 4,700 (5.8%) | |
| ignoring stop sign while leaving intersection | 79 (<0.1%) | |
| illegal overtaking | 445 (0.5%) | |
| inadequate visibility | 13 (<0.1%) | |
| insufficient light | 69 (<0.1%) | |
| internal disturbance | 1 (<0.1%) | |
| loss of control | 49 (<0.1%) | |
| medical condition | 40 (<0.1%) | |
| narrow road | 10 (<0.1%) | |
| navigation equipment failure | 1 (<0.1%) | |
| no presumed cause related to driver | 1 (<0.1%) | |
| no presumed cause related to road conditions | 4 (<0.1%) | |
| no presumed cause related to vehicle conditions | 1 (<0.1%) | |
| no road divider lines | 1 (<0.1%) | |
| no traffic light system | 1 (<0.1%) | |
| no traffic signs | 9 (<0.1%) | |
| obstruction in sight | 7 (<0.1%) | |
| other | 1,576 (1.9%) | |
| overloaded vehicle | 176 (0.2%) | |
| repair/construction on the road | 6 (<0.1%) | |
| reversing vehicle | 70 (<0.1%) | |
| road in poor condition | 7 (<0.1%) | |
| running red lights/traffic signals | 911 (1.1%) | |
| slippery road | 85 (0.1%) | |
| speeding | 60,373 (74%) | |
| straddling lanes | 31 (<0.1%) | |
| sudden stop | 48 (<0.1%) | |
| tailgating | 181 (0.2%) | |
| traffic light system failure | 3 (<0.1%) | |
| turn signal system failure | 2 (<0.1%) | |
| unfamiliarity with the route/unskilled driving | 677 (0.8%) | |
| using mobile phone while driving | 33 (<0.1%) | |
| using psychoactive substances | 1 (<0.1%) | |
| vehicle electrical system failure | 11 (<0.1%) | |
| vehicle equipment failure | 2,747 (3.4%) | |
| worn-out/tire blowout | 127 (0.2%) | |
| เส้นแบ่งทิศทางจราจรชำรุด | 1 (<0.1%) | |
| ป้ายจราจรชำรุด | 1 (<0.1%) | |
| มึนเมาจากแอลกอฮอล์ | 1 (<0.1%) | |
| accident_type | 81,735 | |
| collision at intersection corner | 1,067 (1.3%) | |
| collision during overtaking | 83 (0.1%) | |
| collision with obstruction (on road surface) | 2,742 (3.4%) | |
| head-on collision (not overtaking) | 3,944 (4.8%) | |
| other | 4,188 (5.1%) | |
| pedestrian collision | 945 (1.2%) | |
| rear-end collision | 24,901 (30%) | |
| rollover/fallen on curved road | 10,443 (13%) | |
| rollover/fallen on straight road | 33,046 (40%) | |
| side collision | 336 (0.4%) | |
| turning/retreating collision | 40 (<0.1%) | |
| number_of_vehicles_involved | 81,735 | 1.00 (1.00, 2.00) |
| number_of_fatalities | 81,735 | 0.00 (0.00, 0.00) |
| number_of_injuries | 81,735 | 0.00 (0.00, 1.00) |
| weather_condition | 81,735 | |
| clear | 69,943 (86%) | |
| dark | 368 (0.5%) | |
| foggy | 544 (0.7%) | |
| land slide | 1 (<0.1%) | |
| natural disaster | 47 (<0.1%) | |
| other | 162 (0.2%) | |
| rainy | 10,670 (13%) | |
| latitude | 81,376 | 14.5 (13.5, 16.6) |
| NA | 359 | |
| longitude | 81,376 | 100.56 (99.89, 101.29) |
| NA | 359 | |
| road_description | 81,735 | |
| bridge (across river/canal) | 5 (<0.1%) | |
| connecting to private area | 463 (0.6%) | |
| connecting to public/commercial area | 1,001 (1.2%) | |
| connecting to school area | 121 (0.1%) | |
| four-way intersection | 348 (0.4%) | |
| grade-separated intersection/ramps | 184 (0.2%) | |
| lane-changing area | 7 (<0.1%) | |
| merge lane | 19 (<0.1%) | |
| motorcycle lane | 1 (<0.1%) | |
| other | 7,614 (9.3%) | |
| pedestrian path | 1 (<0.1%) | |
| roundabout | 11 (<0.1%) | |
| sharp curve | 616 (0.8%) | |
| straight road | 58,183 (71%) | |
| t-intersection | 414 (0.5%) | |
| u-turn area | 10 (<0.1%) | |
| wide curve | 12,552 (15%) | |
| y-intersection | 184 (0.2%) | |
| zebra crossing/pedestrian crossing | 1 (<0.1%) | |
| slope_description | 81,735 | |
| no slope | 65,797 (81%) | |
| other | 11,575 (14%) | |
| slope area | 4,363 (5.3%) | |
| 1 Median (IQR); Range; n (%) | ||
Observations
In terms of data health, we can see that there are 359 missing records in the latitude and longitude columns. Since these missing records make up <5% of the 81,735 total road accidents recorded, we can remove these from our data set.
We can use latitude and longitude columns to convert the data into a sf object and transform the coordinates into CRS. The EPSG code for latitude-longitude projection is 4326.
Thailand adopts EPSG 32647, which is in metres.
Since the Bangkok Metropolitan Region is made up of 6 provinces – Bangkok, Nonthaburi, Nakhon Pathom, Pathum Thani, Samut Prakan, and Samut Sakhon, we can use the province_en column to filter for these provinces.
The code chunk performs several functions for preparing and transforming the dataset:
- The
filter()function from the dplyr package is used to remove rows with missing or empty longitude and latitude values, ensuring that the dataset contains only valid spatial data. - The sf package’s
st_as_sf()function converts the data frame into a spatial object (simple feature) using the longitude and latitude columns, setting the coordinate reference system (CRS) to EPSG 4326. - The spatial data is reprojected to EPSG 32647 (a local UTM projection used in Thailand) using
st_transform()from the sf package. - The
filter()function is also applied to retain data exclusively from the Bangkok Metropolitan Region (BMR) by filtering for specific provinces.
accidents_bmr <- accidents %>%
# Removes rows with missing longitude and latitude values
filter(!is.na(longitude) & longitude != "",
!is.na(latitude) & latitude !="") %>%
# Converts data to an sf object using longitude and latitude
st_as_sf(coords = c("longitude", "latitude"),
crs = 4326) %>%
# Transforms to the projection used in Thailand
st_transform(crs = 32647) %>%
# Filter for BMR data
filter(province_en %in% c("Bangkok", "Nonthaburi", "Nakhon Pathom", "Pathum Thani", "Samut Prakan", "Samut Sakhon")) Let’s perform a check for duplicated records before we move on. The code chunk below identifies all rows in the accidents_bmr dataframe that have an exact duplicate (i.e., another row with the same values in all columns) using group_by_all().
duplicate <- accidents_bmr %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
duplicateSimple feature collection with 0 features and 13 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: WGS 84 / UTM zone 47N
# A tibble: 0 × 14
# ℹ 14 variables: acc_code <dbl>, incident_datetime <dttm>, province_en <chr>,
# agency <chr>, vehicle_type <chr>, presumed_cause <chr>,
# accident_type <chr>, number_of_vehicles_involved <dbl>,
# number_of_fatalities <dbl>, number_of_injuries <dbl>,
# weather_condition <chr>, road_description <chr>, slope_description <chr>,
# geometry <GEOMETRY [m]>
Results confirm that there are no duplicated records found.
Let’s take a quick glance at the points:
tmap_mode('plot')
tm_shape(accidents_bmr) +
tm_dots(col = "#800200",
alpha=0.4,
size=0.05) +
tm_layout(main.title = "Accidents",
main.title.position = "center",
main.title.size = 1,
bg.color = "#E4D5C9",
frame = F)
2.2.2 Administrative Boundary
The adminboundary dataset, which we downloaded from HDX, is in ESRI shapefile format. To use this data in an R-environment, we need to import it as an sf object. We can do this using the st_read() function of the sf package. This function reads the shapefile data and returns an sf object that can be used for further analysis.
In the code chunk below, we use %>% operator is used to pipe the output of st_read() to the st_transform() function. Since the dataset we are using is the Thailand boundary, we need to assign the standard coordinate reference system for Thailand, which is EPSG 32647. st_transform() function transforms the coordinate reference system of the sf object to 32647.
adminboundary <- st_read(dsn = "data/geospatial",
layer = "tha_admbnda_adm1_rtsd_20220121") %>%
st_transform(crs = 32647)Reading layer `tha_admbnda_adm1_rtsd_20220121' from data source
`C:\kytjy\ISSS626-GAA\Take-home_Ex\Take-home_Ex01\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 77 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 97.34336 ymin: 5.613038 xmax: 105.637 ymax: 20.46507
Geodetic CRS: WGS 84
st_crs(adminboundary)Coordinate Reference System:
User input: EPSG:32647
wkt:
PROJCRS["WGS 84 / UTM zone 47N",
BASEGEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]],
CONVERSION["UTM zone 47N",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",99,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",500000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Navigation and medium accuracy spatial referencing."],
AREA["Between 96°E and 102°E, northern hemisphere between equator and 84°N, onshore and offshore. China. Indonesia. Laos. Malaysia - West Malaysia. Mongolia. Myanmar (Burma). Russian Federation. Thailand."],
BBOX[0,96,84,102]],
ID["EPSG",32647]]
Preliminary look into the adminboundary data shows that ADM1_EN can be used to filter for the 6 regions in BMR.
glimpse(adminboundary)Rows: 77
Columns: 17
$ Shape_Leng <dbl> 2.417227, 1.695100, 1.251111, 1.884945, 3.041716, 1.739908,…
$ Shape_Area <dbl> 0.13133873, 0.07926199, 0.05323766, 0.12698345, 0.21393797,…
$ ADM1_EN <chr> "Bangkok", "Samut Prakan", "Nonthaburi", "Pathum Thani", "P…
$ ADM1_TH <chr> "กรุงเทพมหานคร", "สมุทรปราการ", "นนทบุรี", "ปทุมธานี", "พระนครศรีอ…
$ ADM1_PCODE <chr> "TH10", "TH11", "TH12", "TH13", "TH14", "TH15", "TH16", "TH…
$ ADM1_REF <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM1ALT1EN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM1ALT2EN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM1ALT1TH <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM1ALT2TH <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM0_EN <chr> "Thailand", "Thailand", "Thailand", "Thailand", "Thailand",…
$ ADM0_TH <chr> "ประเทศไทย", "ประเทศไทย", "ประเทศไทย", "ประเทศไทย", "ประเทศ…
$ ADM0_PCODE <chr> "TH", "TH", "TH", "TH", "TH", "TH", "TH", "TH", "TH", "TH",…
$ date <date> 2019-02-18, 2019-02-18, 2019-02-18, 2019-02-18, 2019-02-18…
$ validOn <date> 2022-01-22, 2022-01-22, 2022-01-22, 2022-01-22, 2022-01-22…
$ validTo <date> -001-11-30, -001-11-30, -001-11-30, -001-11-30, -001-11-30…
$ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((674339.8 15..., MULTIPOLYGON (…
adminboundary_bmr <- adminboundary %>%
select("ADM1_EN") %>%
filter(ADM1_EN %in% c("Bangkok", "Nonthaburi", "Nakhon Pathom", "Pathum Thani", "Samut Prakan", "Samut Sakhon"))After importing the dataset, we will plot it to see how it looks using tmap.
tmap_mode('plot')
tm_shape(adminboundary_bmr)+
tm_fill(col ="#f4e9e8",
alpha = 0.6) +
tm_borders(col = "#ddafa1",
lwd = 0.1,
alpha = 1) +
tm_layout(main.title = "BMR Administrative Boundary",
main.title.position = "center",
main.title.size = 1,
bg.color = "#E4D5C9",
frame = F)
Let’s plot out the administrative boundaries together with the points of accidents:
tmap_mode('plot')
tm_shape(adminboundary_bmr) +
tm_fill(col ="#f4e9e8",
alpha = 0.6) +
tm_borders(col = "#ddafa1",
lwd = 0.1,
alpha = 1) +
tm_shape(accidents_bmr) +
tm_dots(col = "#800200",
alpha=0.4,
size=0.05) +
tm_layout(main.title = "BMR Administrative Boundary and Accidents",
main.title.position = "center",
main.title.size = 1,
bg.color = "#E4D5C9",
frame = F)
2.2.3 Road Lines
The code chunk below imports MP_SUBZONE_WEB_PL shapefile by using st_read() of sf packages.
roads <- st_read(dsn = "data/geospatial",
layer = "hotosm_tha_roads_lines_shp")Observations
The geometry of our data is in multi-linestrings. To convert to individual linestrings, we use
st_cast().Note that roads does not have crs information. The
st_set_crs()function allows us to assign coordinate reference system for the data based on latitude and longitude seen under the Bounding Box variable which is decimal degrees. After assigning the CRS,st_transform()is used to reproject the data to the local CRS. Finally, we can usest_crs()to verify that the correct CRS has been applied.
roads_sf <- st_set_crs(roads, 4326) %>%
st_transform(crs = 32647) %>%
st_cast("LINESTRING")st_crs(roads_sf)Results of the code above show that the EPSG is indicated as 32647 now.
Upon reviewing the road classifications against the highway descriptions by OpenStreetMap, we observe that not all categories are relevant to our analysis, which focuses primarily on driving networks. We will keep the 13 types of road segments below for the scope of our study.
| Name | Description |
|---|---|
| Primary, Primary_Link | Major highway linking large towns |
| Secondary, Secondary_Link | Highways which are not part of major routes, but nevertheless form a link in the national route network |
| Tertiary, Tertiary_Link | Roads connecting smaller settlements, and within large settlements for roads connecting local centres |
| Trunk, Trunk_Link | Major highway that don’t meet the requirements for motorways, and their link roads (sliproads and ramps). |
| Motorway, Motorway_Link | Highest-performance roads within a territory, generally referred to as motorways, freeways, or expressways. |
| Residential | Road generally used for local traffic within the settlement. Primarily used for access to residential properties but may include access to some non-residential properties (e.g. a corner shop or convenience store). |
| Unclassified | Roads with the lowest priority in the interconnecting road network |
| Service | Provides access to a building service station, beach, campsite, industrial estate, business park, etc. |
In the code chunk below, we will first specify the road classes that we want to retain. Next, we will filter roads_sf object to remove all the rows that does not have our desired highway attribute value.
highwaytypes <- c("primary",
"primary_link",
"secondary",
"secondary_link",
"tertiary",
"tertiary_link",
"trunk",
"trunk_link",
"motorway",
"motorway_link",
"residential",
"unclassified",
"service")
roads_sf_filtered <- roads_sf %>%
select("highway") %>%
filter(highway %in% highwaytypes)Since the roads dataset includes areas outside BMR and our analysis is focused on the region within the BMR, we will need to remove unnecessary rows. To do so, we will use st_intersection().
roads_bmr <- st_intersection(roads_sf_filtered,
adminboundary_bmr)roads_bmrSimple feature collection with 555712 features and 2 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 588649.4 ymin: 1484439 xmax: 712417 ymax: 1579041
Projected CRS: WGS 84 / UTM zone 47N
First 10 features:
highway ADM1_EN geometry
1 secondary Bangkok LINESTRING (693686.1 151979...
2 residential Bangkok LINESTRING (693358 1519300,...
3 secondary_link Bangkok LINESTRING (692949.1 151886...
4 service Bangkok LINESTRING (693256 1519184,...
5 secondary Bangkok LINESTRING (692810.8 151863...
6 service Bangkok LINESTRING (693877.2 152042...
27 service Bangkok LINESTRING (691782.9 152016...
49 service Bangkok LINESTRING (668739.2 152053...
50 service Bangkok LINESTRING (665017.7 151746...
54 service Bangkok LINESTRING (665068 1517492,...
Now that we have scoped out the dataset, we will now plot to see the BMR road network.
par(bg = '#E4D5C9', mar = c(0,0,1,0))
plot(st_geometry(adminboundary_bmr),
col = "#f4e9e8",
main = "Road Network in BMR")
plot(st_geometry(roads_bmr),
add=T,
col='#800200')
3 Data Wrangling
3.1 Deriving new variables from Accidents data
Using the incident_datetime column, we can also derive additional columns such as seasons, month, day of the week, time of the day (e.g. morning or evening peak periods). The morning rush hour is said to last from 6am to 9am and evening rush hour is reported to be from 4pm to 7pm (The Nation).
I also think it would be interesting to derive a column that indicates accidents that happened during the Songkran festival holiday. Notoriously dubbed as the Seven Deadly Days of Songkran, road accidents in Bangkok is said to surge due increased traffic from people traveling to celebrate with family.
accidents_bmr_extra <- accidents_bmr %>%
# Derive month, days, hour columns as well as a Songkran indicator
mutate(
month = month(incident_datetime,
label = FALSE),
day = wday(incident_datetime,
label = FALSE),
hour = hour(incident_datetime),
songkran = ifelse(
as_date(incident_datetime) >= as_date(paste0(year(incident_datetime), "-04-09")) &
as_date(incident_datetime) <= as_date(paste0(year(incident_datetime), "-04-16")) &
year(incident_datetime) %in% c(2019, 2020, 2021, 2022),
1,
0)) %>%
# Derive season column and peak period indicator
mutate(
weektype = ifelse(
day %in% c("6", "7"),
"weekend",
"weekday"
),
season = ifelse(
month %in% c("2", "3", "4", "5"),
"Summer",
ifelse(
month %in% c("6", "7", "8", "9", "10"),
"Rainy",
"Winter"
)
),
peakperiod = ifelse(
hour > 6 & hour < 9,
"morningpeak",
ifelse(
hour > 16 & hour < 19,
"eveningpeak",
"non-peak"
)
)
) %>%
# Drop columns not required anymore
select(-c("acc_code",
"incident_datetime",
"agency"
))The code chunk above performs the following function:
- The lubridate package within tidyverse is utilised to extract temporal components (month, day of the week, and hour) from the incident_datetime column using the
month(),wday(), andhour()functions, respectively. - The songkran indicator is created using
ifelse(), along withas_date()andyear(), to generate a binary variable (1 for “Yes”, 0 for “No”) indicating whether the accident occurred during the Songkran festival (April 9 to April 16) in the years 2019, 2020, 2021, or 2022. - Two additional indicators are derived using
ifelse():- season variable: classifies accidents into seasons (“Summer”, “Rainy”, or “Winter”) based on the month.
- peakperiod variable: identifies whether the accident occurred during morning peak hours (7-9 AM), evening peak hours (5-7 PM), or non-peak times.
- Finally,
select()from dplyr is used to keep only the necessary variables for the study.
3.2 Converting sf format into spatstat’s ppp format
In order to use the capabilities of spatstat package, a spatial dataset should be converted into an object of class planar point pattern (ppp). A ppp object contains the spatial coordinates of the points, the marks attached to the points (if any), the window in which the points were observed, and the name of the unit of length for the spatial coordinates. Thus, a single object of class ppp contains all the information required to perform spatial point pattern analysis.
In previous section, we have created sf objects of accident points. Now, we will convert them into ppp objects using as.ppp() function from spatstat package.
The code chunk below converts the accidents_bmr_extra object to a point pattern object of class ppp. st_coordinates() function is used to extract the coordinates of the accidents_bmr_extra object and st_bbox() function is used to extract the bounding box of the accidents_bmr_extra object. The resulting object accidents_ppp is a point pattern object of class ppp.
accidents_ppp <- as.ppp(st_coordinates(accidents_bmr_extra),
st_bbox(accidents_bmr_extra))
par(bg = '#E4D5C9',
mar = c(0,0,1,0))
plot(accidents_ppp)
3.3 Duplicates check
We will use summary() function to get summary information of accidents_ppp object.
summary(accidents_ppp)Planar point pattern: 12986 points
Average intensity 1.218049e-06 points per square unit
*Pattern contains duplicated points*
Coordinates are given to 10 decimal places
Window: rectangle = [591277.5, 710166.1] x [1486845.7, 1576520.5] units
(118900 x 89670 units)
Window area = 10661300000 square units
Observations
Note that the message above suggests that the pattern contains duplicated points.
When analysing spatial point processes, it is important to avoid duplication of points. This is because statistical methodology for spatial point processes is based largely on the assumption that points of the process can never be coincident. When the data have coincident points, some statistical procedures designed for simple point processes will be affected.
3.4 Jittering
To resolve the issue of duplicated points, we apply jittering with the rjitter() function. This adds a small variation to the points, preventing them from occupying the exact same location.
set.seed(1234)
accidents_ppp_jit <- rjitter(accidents_ppp,
retry=TRUE,
nsim=99,
drop=TRUE)The code below checks the jittered points in the chosen simulation (i.e. Simulation 99) to ensure that no duplicates remain after applying jittering.
any(duplicated(accidents_ppp_jit[["Simulation 99"]]))[1] FALSE
par(bg = '#E4D5C9',
mar = c(0,0,1,0))
accidents_ppp_jit <- accidents_ppp_jit[["Simulation 99"]]
plot(accidents_ppp_jit)
3.5 Creating Observation Windows
Many data types in spatstat require us to specify the region of space inside which the data were observed. This is the observation window and it is represented by an object of class owin. In this analysis, our study area is BMR, hence we will use BMR boundary as the observation window for spatial point pattern analysis.
To convert our adminboundary_bmr sf object to owin object, we will use as.owin() function from spatstat package.
adminboundary_bmr_owin <- as.owin(adminboundary_bmr)
par(bg = '#E4D5C9',
mar = c(0,0,1,0))
plot.owin(adminboundary_bmr_owin)
summary(adminboundary_bmr_owin)Window: polygonal boundary
single connected closed polygon with 13779 vertices
enclosing rectangle: [587893.5, 712440.5] x [1484413.7, 1579076.3] units
(124500 x 94660 units)
Window area = 7668990000 square units
Fraction of frame area: 0.65
3.6 Combing ppp and owin objects
In section 3.4, we have created our ppp object (accidents_ppp_jit) which represents the spatial points of accident locations. In section 3.5, we have created a owin object called (adminboundary_bmr_owin), which represent the observation window of our analysis.
The observation window adminboundary_bmr_owin and the point pattern accidents_ppp_jit can be combined, so that the custom window replaces the default rectangular extent (as seen in section 3.2).
acc_bmr_ppp = accidents_ppp_jit[adminboundary_bmr_owin]
par(bg = '#E4D5C9',
mar = c(0,0,1,0))
plot(acc_bmr_ppp)
4 Exploratory Spatial Data Analysis
Descriptive statistics are used in point pattern analysis to summarise a point pattern’s basic properties, such as its central tendency and dispersion. The mean center and the median centre are 2 frequently used metrics for central tendency.
4.1 Measuring Central Tendency
4.1.1 Mean Center
Mean center is the arithmetic average of the (x, y) coordinates of all point in the study area. Similar to mean in statistical analysis, mean center is influenced to a greater degree by the outliers.
accidents_xy <- st_coordinates(accidents_bmr_extra)
accidents_mc <- apply(accidents_xy, 2, mean)
accidents_mc X Y
668399.5 1523495.8
The results show that the mean centre is at (668399.5, 1523495.9).
4.1.2 Median Center
Median center is the location that minimises the sum of distances required to travel to all points within an observation window. The procedure begins at a predetermined point, such as the median center, as the initial point. Then, the algorithm updates the median center’s new coordinates (x’, y’) continually until the optimal value is reached. The median center, as opposed to the mean center, offers a more reliable indicator of central tendency as it is unaffected by outliers.
accidents_medc <- apply(accidents_xy, 2, median)
accidents_medc X Y
673446.1 1520755.0
Based on the results, the median centre of accidents is (673446.1, 1520755.0).
The mean centers and median centers are similar. This may imply that the distribution of the data is relatively balanced and there is not a significant difference in the spatial patterns between the accident points. Additionally, this indicates that both the mean center and median center are effective measures for analysing the central tendency of the data in this context.
par(bg = '#E4D5C9', mar = c(0,0,1,0))
plot(st_geometry(adminboundary_bmr),
col='#f4e9e8')
plot(accidents_xy,
add = T, cex=0.7, pch = 21,
main="Mean and Median Centers of Accidents in BMR")
points(cbind(accidents_mc[1], accidents_mc[2]), pch='*', col='#f5347f', cex=3)
points(cbind(accidents_medc[1], accidents_medc[2]), pch='*', col='#bb8bdc', cex=3)
4.2 Measuring Dispersion
4.2.1 Standard Distance
Standard distances are defined similarly to standard deviations. This indicator measures how dispersed a group of points is around its mean center.
accidents_sd <- sqrt(sum((accidents_xy[,1] - accidents_mc[1])^2 +
(accidents_xy[,2] - accidents_mc[2])^2)
/ nrow(accidents_xy))
accidents_sd[1] 27235.14
4.2.2 Plotting Standard Distance
In this section, we will create bearing circle of accident points using the standard distance value we have calculated earlier. This can provide visual representation of the dispersion.
par(bg = '#E4D5C9',
mar = c(0,0,1,0))
plot(st_geometry(adminboundary_bmr), col='#f4e9e8', main="Standard Distance of Accidents in BMR")
plot(accidents_xy, cex=.5)
points(cbind(accidents_mc[1], accidents_mc[2]), pch='*', col='#f5347f', cex=3)
bearing <- 1:360 * pi/180
cx <- accidents_mc[1] + accidents_sd * cos(bearing)
cy <- accidents_mc[2] + accidents_sd * sin(bearing)
circle <- cbind(cx, cy)
lines(circle, col='#f5347f', lwd=2)
4.3 Spatial Randomness Test
Clark and Evans (1954) give a very simple test of spatial randomness called Clark and Evans aggregation index (R). It is the ratio of the observed mean nearest neighbour distance in the pattern to that expected for a Poisson point process of the same intensity. R-value >1 suggests ordering, while R-value <1 suggests clustering.
We will perform the Clark-Evans test of aggregation for a spatial point pattern by using clarkevans.test() of statspat.
The test hypotheses are:
\(H_0\) = The distribution of accident points are randomly distributed.
\(H_1\) = The distribution of accidents points are not randomly distributed.
The 95% confidence interval will be used.
set.seed(1234)
clarkevans.test(acc_bmr_ppp,
correction="none",
clipregion="adminboundary_bmr_owin",
alternative=c("clustered"),
nsim=99)
Clark-Evans test
No edge correction
Z-test
data: acc_bmr_ppp
R = 0.22579, p-value < 2.2e-16
alternative hypothesis: clustered (R < 1)
The Clark-Evans test for the accident points shows an R-value of 0.22579, which is less than 1. This indicates a clustered distribution. The p-value is less than 2.2e-16, which is extremely small and less than the significance level of 0.05. This means that we will reject the null hypothesis (\(H_0\)) and accept the alternative hypothesis (\(H_1\)).
Therefore, the statistical inference from this test is that the accident points are not randomly distributed but are clustered. This suggests that there may be underlying factors influencing the spatial distribution of these points.
5 First-Order Spatial Point Pattern Analysis
We can now start to perform first-order spatial point pattern analysis using functions from spatstat package.
First-order properties concern the characteristics of individual point locations and their variations of their density across space and are mostly addressed by density-based techniques, such as quadrant analysis and kernel density estimation.
Investigation of the intensity of a point pattern is one of the first and most important steps in point pattern analysis. If the point process has an intensity function λ(u), this function can be estimated non-parametrically by kernel estimation. Kernel estimation allows for smoothing of the probability density estimation of a random variable (in this analysis a point event) based on kernels as weights.
5.1 Rescaling acc_bmr_ppp
The EPSG: 32647 Coordinate References System uses meters as the standard unit. Thus, acc_bmr_ppp prepared earlier is also in metres. However, we will need to convert the measuring unit from metre to kilometeres when calculating the kernel density estimators for entirety of BMR because kilometers provide a more appropriate scale for analysing large areas.
acc_bmr_ppp.km <- rescale(acc_bmr_ppp, 1000, "km")5.2 Computing Default Kernel Density Estimation
KDE is particularly effective in identifying areas with high occurrences of traffic accidents. It divides the study area into cells, overlays a symmetrical, curved surface on each accident location, and aggregates these values within a given radius to estimate density. The density is highest at the accident location and decreases with distance, reflecting a distance decay effect.
To do that, we use density.ppp() from the spatstat package.
Show the code
par(bg = '#E4D5C9',
mar = c(0,0,1,0))
kde_default <- density(acc_bmr_ppp.km)
plot(kde_default, main = "Default Density KDE")
contour(kde_default, add=TRUE)
The key argument to pass to the density() function for point pattern objects is sigma, which determines the smoothing bandwidth of the kernel. A narrow bandwidth can reveal local fluctuations, useful for detailed analyses, while a larger bandwidth smooths out variations, providing a general overview.
By default, when the sigma value isn’t provided, a bandwidth is determined by a simple rule of thumb that depends only on the size of the window. This default setting might not always give the desired result. In the KDE plot we generated, there’s evidence of over-smoothing, where only one large spatial cluster is visible, potentially hiding smaller clusters or important details.
To address this, we can manually set the bandwidth using the sigma argument or choose a different kernel function through the kernel argument. This will help create more intuitive and detailed KDE maps that better capture the structure of the data.
5.3 KDE Layers with Fixed Bandwidth
5.3.1 Computing Fixed Bandwidths Using Different Bandwidth Selection Methods
4 automatic bandwidth calculation methods are available:
bw.diggle(): In the Cross Validated Bandwidth Selection, the bandwidth is chosen to minimise the mean-square error criterion. The mean-square error is a measure of the average of the squares of the errors - that is, the average squared difference between the estimated values and the actual value.bw.CvL(): In the Cronie and van Lieshout’s Criterion for Bandwidth Selection, the bandwidth is chosen to minimise the discrepancy between the area of the observation window and the sum of reciprocal estimated intensity values at the points of the point process. This method aims to choose a bandwidth that best represents the underlying point process, taking into account both the observed points and the area they occupy.bw.scott(): In the Scott’s Rule for Bandwidth Selection, the bandwidth is computed by the rule of thumb where the bandwidth is proportional to \(n^{-1/(d+4)}\), where n is the number of points and d is the number of spatial dimensions. This method is useful for estimating gradual trend.bw.ppl(): In the Likelihood Cross Validation Bandwidth Selection, the bandwidth is chosen to maximise the point process likelihood.
bw_diggle <- bw.diggle(acc_bmr_ppp.km)
bw_diggle sigma
0.04025879
bw_CvL <- bw.CvL(acc_bmr_ppp.km)
bw_CvL sigma
11.55349
bw_scott <- bw.scott(acc_bmr_ppp.km)
bw_scott sigma.x sigma.y
4.527996 3.318169
bw_ppl <- bw.ppl(acc_bmr_ppp.km)
bw_ppl sigma
0.3493275
Observations
Note that bw_diggle, bw_CvL and bw_ppl functions produce a numeric sigma value, while bw_scott provides a separate bandwidth for the x and y coordinates respectively. The sigma.x and sigma.y values represent the amount of smoothing applied in each direction when estimating the kernel density.
We can specify isotropic=TRUE argument when calculating bw_scott() method to produce a single value bandwidth, or use the bw.scott.iso() function instead.
bw_scott_iso <- bw.scott.iso(acc_bmr_ppp.km)
bw_scott_iso sigma
3.876165
par(bg = '#E4D5C9',
#mar = c(0,0,1,0),
mfrow = c(1,2))
plot(bw_diggle, xlim=c(0.0,0.06), ylim=c(-160,100))
plot(bw_CvL)
par(bg = '#E4D5C9',
#mar = c(0,0,1,0),
mfrow = c(1,2))
plot(bw_scott, main="bw_scott")
plot(bw_ppl,
xlim=c(-1,5),
ylim=c(00,30000))
5.3.2 Choosing Fixed-Bandwidth KDE
kde_diggle <- density(acc_bmr_ppp.km, bw_diggle)
kde_CvL <- density(acc_bmr_ppp.km, bw_CvL)
kde_scott <- density(acc_bmr_ppp.km, bw_scott)
kde_ppl <- density(acc_bmr_ppp.km, bw_ppl)
par(bg = '#E4D5C9',
mar = c(1,1,1,1.5),
mfrow = c(2,2))
plot(kde_diggle,main = "kde_diggle")
plot(kde_CvL,main = "kde_CvL")
plot(kde_scott,main = "kde_scott")
plot(kde_ppl,main = "kde_ppl")
Next, we will try to plot histograms to compare the distribution of KDE values obtained from density() function using different bandwidth selection methods.
par(bg = '#E4D5C9',
mar = c(2,2,2,2),
mfrow = c(2,2))
hist(kde_diggle,main = "kde_diggle")
hist(kde_CvL,main = "kde_CvL")
hist(kde_scott,main = "kde_scott")
hist(kde_ppl,main = "kde_ppl")
Observations
kde_diggle and **kde_ppl*: The sharp spike at the beginning indicates a high concentration of points within the first bin, while the remaining bins show little to no density. This pattern suggests that a specific area within our observation window experiences significant spatial clustering, with other areas showing much less activity.
kde_CvL: This method results in a more evenly spread distribution, indicating that spatial point concentrations are more dispersed across the area. However, the smaller bin size in this method may smooth out the data too much, potentially masking smaller, more localised clusters or important spatial patterns.
kde_scott: Compared to the other methods, kde_scott shows a wider range of values and a less pronounced initial spike, suggesting it captures both highly concentrated and moderately concentrated areas more effectively. This method balances the spatial distribution better by not overly smoothing or skewing the data.
Based on these observations, we will proceed with the scott method for further analysis. The scott method strikes a good balance between bias and variance. If the bandwidth is too small, the estimate can become overly variable and noisy (high variance), as seen in the histograms for bw_diggle and bw_ppl. Conversely, if the bandwidth is too large, the data becomes oversmoothed, losing important spatial details (high bias), which is evident in the kde_CvL histogram. By balancing these two extremes, bw_scott provides a more comprehensive and detailed representation of spatial clustering without losing important nuances.
par(bg = '#E4D5C9',
mar = c(0,0,1,0))
kde_fixed_scott <- density(acc_bmr_ppp.km, bw_scott)
plot(kde_fixed_scott,
main = "Fixed-Bandwidth KDE for Accident Points (Using bw_scott)")
contour(kde_fixed_scott, add=TRUE)
Visual inspection reveals that the bandwidth suggested by the bw_scott method causes noticeable over-smoothing. Although automatic bandwidth selection offers a solid initial estimate, fine-tuning is often required to achieve more accurate results.
To counteract the over-smoothing, we will apply a simple adjustment by reducing the bandwidth by half. This reduction should help capture more detailed patterns in the data and avoid the loss of important spatial information.
par(bg = '#E4D5C9',
mar = c(0,0,1,0))
kde_fixed_scott <- density(acc_bmr_ppp.km, bw_scott/2)
plot(kde_fixed_scott,
main = "Fixed-Bandwidth KDE for Accident Points (Using bw_scott)")
contour(kde_fixed_scott, add=TRUE)
From the plot above, it seems that reducing the bandwidth (which shrinks the point cluster buffers) has lessened the over-smoothing effect while still clearly highlighting the accident hotspot areas.
5.3.3 Kernel Methods
There are 4 types of kernels in density.ppp(), namely: Gaussian, Epanechnikov, Quartic and Dics. Gaussian kernel is the default.
The code chunk below will be used to compute 3 more kernel density estimations by using these three kernel function.
kde_fixed_scott.gaussian <- density(acc_bmr_ppp.km,
sigma=bw_scott,
edge=TRUE,
kernel="gaussian")
kde_fixed_scott.epanechnikov <- density(acc_bmr_ppp.km,
sigma=bw_scott,
edge=TRUE,
kernel="epanechnikov")
kde_fixed_scott.quartic <- density(acc_bmr_ppp.km,
sigma=bw_scott,
edge=TRUE,
kernel="quartic")
kde_fixed_scott.disc <- density(acc_bmr_ppp.km,
sigma=bw_scott,
edge=TRUE,
kernel="disc")
par(bg = '#E4D5C9',
mar = c(0,0,1,0),
mfrow = c(2,2))
plot(kde_fixed_scott.gaussian, main="Gaussian")
plot(kde_fixed_scott.epanechnikov, main="Epanechnikov")
plot(kde_fixed_scott.quartic, main="Quartic")
plot(kde_fixed_scott.disc, main="Disc")
Observations
While there are slight variations in smoothness and spread, all 4 plots exhibit similar density patterns. This suggests that the choice of kernel function has minimal impact on the overall KDE results. Therefore, we will not prioritise this factor in our analysis.
5.4 KDE Layers with Spatially Adaptive Bandwidth
The bandwidth of a kernel estimator can be either fixed across the entire mapping area or adaptive to suit in local situations. The fixed bandwidth method explored in our earlier analysis is said to be sensitive to highly skewed distribution of spatial point patterns over geographical units for example urban versus rural. One way to overcome this problem is by using adaptive bandwidth instead.
In the section below, we compare the fixed with adaptive bandwidth-based KDE, and how they were able to detect accident hot spots.
adaptive.density() of Spatstat offers 3 estimation methods:
- method = “voronoi”: which estimates the intensity using the Voronoi-Dirichlet tessellation
- method = “kernel”: which estimates the intensity using a variable-bandwidth kernel estimator
- `method = “nearest”: which computes an estimate of the intensity function of a point pattern dataset using the distance from each spatial location to the kth nearest points
kde_adaptive_vd <- adaptive.density(acc_bmr_ppp.km,
method = "voronoi")Show the code
par(bg = '#E4D5C9',
mar = c(0,0,1,0))
plot(kde_adaptive_vd,
main = "Voronoi-Dirichlet Adaptive Density Estimate")
kde_adaptive_kernel <- adaptive.density(acc_bmr_ppp.km,
method = "kernel")Show the code
par(bg = '#E4D5C9',
mar = c(0,0,1,0))
plot(kde_adaptive_kernel,
main = "Adaptive Kernel Density Estimate")
kde_adaptive_knn <- adaptive.density(acc_bmr_ppp.km,
method = "nearest",
k = 100)Show the code
par(bg = '#E4D5C9',
mar = c(0,0,1,0))
plot(kde_adaptive_knn,
main = "Nearest-Neighbour Adaptive Density Estimate")
5.4.1 Choosing Adaptive KDE Method
Just as we did with the fixed bandwidth, we can create histograms to compare the distribution of KDE values obtained from the density() function using different adaptive bandwidth selection methods.
par(bg = '#E4D5C9',
mar = c(2,2,2,2),
mfrow = c(2,2))
hist(kde_adaptive_vd, main = "Voronoi-Dirichlet Adaptive")
hist(kde_adaptive_kernel, main = "Adaptive Kernel")
hist(kde_adaptive_knn, main = "Nearest-Neighbour Adaptive")
Observations
Analysis of the outputs shows no significant differences in the KDE value distributions across the various methods. Each method reveals a high concentration of points in the same area. Given this consistency, we will opt for the Adaptive Kernel method because it provides a greater number of bins. This increased number of bins allows for a more granular and detailed view of the density distribution, which can enhance our analysis by offering finer insights into spatial patterns.
5.5 Plotting Interactive KDE Maps
raster_kde_fixed_scott <- raster(kde_fixed_scott)
raster_kde_adaptive_nn <- raster(kde_adaptive_knn)
raster_kde_adaptive_kernel <- raster(kde_adaptive_kernel)
projection(raster_kde_fixed_scott) <- CRS("+init=EPSG:32647 +units=km")
projection(raster_kde_adaptive_nn) <- CRS("+init=EPSG:32647 +units=km")
projection(raster_kde_adaptive_kernel) <- CRS("+init=EPSG:32647 +units=km")Show the code
tmap_mode('view')
kde_fixed_scott <- tm_basemap(server = "OpenStreetMap") +
tm_shape(raster_kde_fixed_scott) +
tm_raster("layer",
n = 10,
title = "KDE_Fixed_scott",
alpha = 0.6,
palette = c("#f9edd1","#feb7c9","#f775a9", "#bb8bdc","#021c9e")) +
tm_shape(adminboundary_bmr)+
tm_polygons(alpha=0.1,id="ADM1_EN")+
tmap_options(check.and.fix = TRUE)
kde_adaptive_nn <- tm_basemap(server = "OpenStreetMap") +
tm_shape(raster_kde_adaptive_nn) +
tm_raster("layer",
n = 7,
title = "KDE_Adaptive_nn",
style = "pretty",
alpha = 0.6,
palette = c("#f9edd1","#feb7c9","#f775a9", "#bb8bdc","#021c9e")) +
tm_shape(adminboundary_bmr)+
tm_polygons(alpha=0.1,id="ADM1_EN")+
tmap_options(check.and.fix = TRUE)
kde_adaptive_kernel <- tm_basemap(server = "OpenStreetMap") +
tm_shape(raster_kde_adaptive_kernel) +
tm_raster("layer",
n = 7,
title = "KDE_Adaptive_Kernel",
style = "pretty",
alpha = 0.6,
palette = c("#f9edd1","#feb7c9","#f775a9", "#bb8bdc","#021c9e")) +
tm_shape(adminboundary_bmr)+
tm_polygons(alpha=0.1,id="ADM1_EN")+
tmap_options(check.and.fix = TRUE)
tmap_arrange(kde_fixed_scott,
kde_adaptive_nn,
kde_adaptive_kernel,
ncol=1,
nrow=3,
sync = TRUE)Show the code
tmap_mode('plot')Observations
The analysis reveals that the highest concentration of accidents is found in Bangkok, particularly in the vicinity of the Bangkok−Ban Chang Motorway and the Bangkok Outer Ring Road. These roads are part of Motorway Route 7 and Route 9, respectively, where accident concentrations can reach up to 30 incidents. Additionally, Borommaratchachonnani Road, which is part of Highway 338 and runs along the western edge of Bangkok near the Chao Phraya River, also exhibits high accident rates. Other areas with notable accident density include major highways within the BMR.
Different maps suggests different stories. The Adaptive Nearest Neighbour and Adaptive Kernel KDE methods both identify a significant concentration of accidents in the Khlong Chang Tai area along Highway 3701 in Samut Prakan, with a reported high concentration of 600 incidents. This area stands out due to its substantial accident rates, suggesting that it might be a focal point for further investigation and intervention. This was not identified in the Fixed Bandwidth KDE.
Overall, the findings indicate that accident hotspots are predominantly located around major roadways and highways, with specific regions exhibiting particularly high accident densities. This pattern highlights the need for targeted road safety measures and further analysis in these critical areas.
5.6 Province-Level KDE
For a more detailed look, we will create province-area level KDE maps for 2 provinces identified. In order to create such maps, we will carry out additional data wrangling as required.
Firstly, we will filter out different planning areas as separate sf objects from adminboundary_bmr.
bkk = adminboundary_bmr %>% filter(ADM1_EN == "Bangkok")
spk = adminboundary_bmr %>% filter(ADM1_EN == "Samut Prakan")
par(bg = '#E4D5C9',
mar = c(1,1,1,0),
mfrow=c(1,2))
plot(st_geometry(bkk), main = "Bangkok")
plot(st_geometry(spk), main = "Samut Prakan")
Next, we will create owin objects to represent the observation windows for respective planning area. Once owin objects are created, we will also filter accident locations in each observation window from the original acc_bmr_ppp ppp object.
bkk_owin = as.owin(bkk)
spk_owin = as.owin(spk)
acc_bkk_ppp = acc_bmr_ppp[bkk_owin]
acc_spk_ppp = acc_bmr_ppp[spk_owin]Now that we have prepared both owin and ppp objects for each planning area, we are ready to plot KDE maps. Similar to what we have done in previous section, we will try both fixed-bandwidth and adaptive bandwidth KDE maps.
5.6.1 Province-level Fixed Bandwidth KDE Maps
bkk_kde_scott <- density(acc_bkk_ppp, sigma=bw.scott, main="Bangkok")
spk_kde_scott <- density(acc_spk_ppp, sigma=bw.scott, main="Samut Prakan")
par(bg = '#E4D5C9',
mar = c(1,1,1,1.5),
mfrow = c(1,2))
plot(bkk_kde_scott,
main = "Fixed KDE - Bangkok")
contour(bkk_kde_scott,
add=TRUE)
plot(spk_kde_scott,
main = "Fixed KDE - Samut Prakan")
contour(spk_kde_scott,
add=TRUE)
5.6.2 Province-level Adaptive-Bandwidth KDE Maps
bkk_kde_adaptive_kernel <- adaptive.density(acc_bkk_ppp, method = "kernel")
spk_kde_adaptive_kernel <- adaptive.density(acc_spk_ppp, method = "kernel")
par(bg = '#E4D5C9', mar = c(1,1,1,1.5),mfrow = c(1,2))
plot(bkk_kde_adaptive_kernel,
main = "Adaptive KDE - Bangkok")
contour(bkk_kde_adaptive_kernel,
add=TRUE)
plot(spk_kde_adaptive_kernel,
main = "Adaptive KDE - Samut Prakan")
contour(spk_kde_adaptive_kernel,
add=TRUE)
Observations
The planning area-level KDE maps highlight some limitations when applied to smaller regions like provinces, despite their strength in visualising spatial data.
One issue is that fixed KDE maps tend to over-smooth the data, making it hard to identify key details, while adaptive KDE maps sometimes under-smooth, leading to too much noise. This makes it challenging to extract meaningful insights from the data.
Additionally, because KDE calculates values using grid pixels and Euclidean distance, the resulting grid blocks limit our ability to see detailed differences within smaller areas such as the example below. This restricts our ability to identify finer patterns within each region, reducing the effectiveness of the analysis at a more localised level.
6 Network Constrained Kernel Density Estimation (NKDE)
In real-world scenarios, events like accidents tend to follow specific networks, such as road systems, rather than being randomly spread out. Traditional Kernel Density Estimation (KDE) assumes events occur across an open, two-dimensional space, which doesn’t accurately reflect network-based patterns like road traffic. Since movement on these networks is confined to one-dimensional paths, Network Constrained Kernel Density Estimation (NKDE) provides an extension of spatial KDE.
NKDE estimates event density strictly along the network, dividing roads into small units called “lixels” (like one-dimensional pixels) and calculating event density at the center of these segments. Instead of using straight-line (Euclidean) distances between events, NKDE measures the shortest path along the network. This method adjusts the intensity estimate to reflect density along a road rather than across an area, providing a clearer interpretation.
In this section, we’ll focus on generating NKDE maps for the two provinces identified earlier, using the spNetwork package to better understand these road-based event patterns.
6.1 Extracting Road Networks for Bangkok and Samut Prakan
First, we will have to extract the road network details and accidents pertaining to our 2 provinces.
The code chunk below performs the following functions:
st_intersection()is used to find the geometries that overlap between roads_bmr (the road network data for the entire Bangkok Metropolitan Region) and the boundary of Bangkok or Samut Prakan.st_union()combines all individual geometries within the boundary of Bangkok or Samut Prakan into a single geometry to perform the intersection. This ensures that the entire boundary of Bangkok is treated as a single spatial unit during the intersection.- After the intersection,
filter()ensures that only LINESTRING geometries (i.e., line segments representing roads) are retained. This step removes any points or other geometries that may have resulted from the intersection. - Similar to the road network extraction, the
st_intersection()function is used to find the accidents (from accidents_bmr_extra, which contains accident data for the BMR) that fall within the Bangkok and Samut Prakan boundaries.
# Road Network
bkk_network <- st_intersection(roads_bmr, st_union(bkk)) %>%
filter(st_geometry_type(.) == "LINESTRING")
spk_network <- st_intersection(roads_bmr, st_union(spk)) %>%
filter(st_geometry_type(.) == "LINESTRING")
write_rds(bkk_network, "data/rds/bkk_network.rds")
write_rds(spk_network, "data/rds/spk_network.rds")# Accidents
bkk_acc <- st_intersection(accidents_bmr_extra, bkk)
spk_acc <- st_intersection(accidents_bmr_extra, spk)
write_rds(bkk_acc, "data/rds/bkk_acc.rds")
write_rds(spk_acc, "data/rds/spk_acc.rds")6.2 Preparing lixel objects
Before calculating the NKDE, the road network must be segmented into smaller units, called lixels, using the lixelize_lines() function from the spNetwork package. A lixel represents a linear pixel along the road network, which is used as a basis for estimating spatial density.
In the code chunk below, each lixel is set to a length of 500 meters, while the minimum length (mindist) is set to 250 meters. This means that if the final lixel in a segment is shorter than 250 meters, it will be merged with the preceding lixel. Lixels that are already shorter than the minimum distance remain unchanged.
Since we have narrowed down to the 2 provinces where road traffic accidents can be highly clustered around certain hotspots, a smaller lixel size (i.e., 500 meters) might offer more detailed insights into local variations in accident density. Larger lixels, like 1000 meters, can potentially smooth out these variations, missing critical hotspots, especially in densely populated urban areas.
bkk_lixels <- lixelize_lines(bkk_network,
500,
mindist = 250)
spk_lixels <- lixelize_lines(spk_network,
500,
mindist = 250)
write_rds(bkk_lixels, "data/rds/bkk_lixels.rds")
write_rds(spk_lixels, "data/rds/spk_lixels.rds")6.3 Generating line centre points
Next, lines_center() of spNetwork will be used to generate a SpatialPointsDataFrame with line centre points. The points are located at center of the line based on the length of the line.
bkk_samples <- lines_center(bkk_lixels)
spk_samples <- lines_center(spk_lixels)6.4 NKDE Calculation
spNetwork package supports 3 NKDE methods: simple, discontinuous and continuous. Gelb (2021) summarises the 3 methods as such:
Simple: This method replaces Euclidean distances with network distances to estimate density along the network. It adapts the kernel formula to calculate density over a linear unit rather than an areal unit, making it intuitive but potentially biased.
Discontinuous: This method addresses the bias in the simple method by ensuring the kernel function divides the density mass equally at intersections. It conserves mass, making it a more accurate estimator, but the density becomes discontinuous.
Continuous: To overcome the discontinuity, this method adjusts the density before intersections to make the function continuous while still conserving mass, offering a more intuitive and smooth result.
In the following code chunks, we will compare NKDE values across three different methods. After several attempts, we optimised the computation time by adjusting certain parameters: reducing the bandwidth (bw) to 500, setting the maximum depth (max_depth) to 8, aggregating 5 events within a certain distance (agg), and specifying a grid_shape of c(10, 10). With the gridded application of the NKDE, the calculation is then performed in each cell of the grid. These adjustments were necessary due to the large size of our dataset.
bkk_nkde_simple <- nkde(bkk_network,
events = bkk_acc,
w = rep(1, nrow(bkk_acc)),
samples = bkk_samples,
kernel_name = "quartic",
bw = 500, #<<
div= "bw",
method = "simple",
digits = 1,
tol = 1,
grid_shape = c(10,10),
max_depth = 8, #<<
agg = 5, #<<
sparse = TRUE,
verbose = FALSE,
study_area = bkk)spk_nkde_simple <- nkde(spk_network,
events = spk_acc,
w = rep(1, nrow(spk_acc)),
samples = spk_samples,
kernel_name = "quartic",
bw = 500,
div= "bw",
method = "simple",
digits = 1,
tol = 1,
grid_shape = c(10,10),
max_depth = 8,
agg = 5,
sparse = TRUE,
verbose = FALSE,
study_area = spk)bkk_nkde_discon <- nkde(bkk_network,
events = bkk_acc,
w = rep(1, nrow(bkk_acc)),
samples = bkk_samples,
kernel_name = "quartic",
bw = 500,
div= "bw",
method = "discontinuous",
digits = 1,
tol = 1,
grid_shape = c(10,10),
max_depth = 8,
agg = 5,
sparse = TRUE,
verbose = FALSE,
study_area = bkk)spk_nkde_discon <- nkde(spk_network,
events = spk_acc,
w = rep(1, nrow(spk_acc)),
samples = spk_samples,
kernel_name = "quartic",
bw = 500,
div= "bw",
method = "discontinuous",
digits = 1,
tol = 1,
grid_shape = c(10,10),
max_depth = 8,
agg = 5,
sparse = TRUE,
verbose = FALSE,
study_area = spk)bkk_nkde_cont <- nkde(bkk_network,
events = bkk_acc,
w = rep(1, nrow(bkk_acc)),
samples = bkk_samples,
kernel_name = "quartic",
bw = 500,
div= "bw",
method = "continuous",
digits = 1,
tol = 1,
grid_shape = c(10,10),
max_depth = 8,
agg = 5,
sparse = TRUE,
verbose = FALSE,
study_area = bkk)spk_nkde_cont <- nkde(spk_network,
events = spk_acc,
w = rep(1, nrow(spk_acc)),
samples = spk_samples,
kernel_name = "quartic",
bw = 500,
div= "bw",
method = "continuous",
digits = 1,
tol = 1,
grid_shape = c(10,10),
max_depth = 8,
agg = 5,
sparse = TRUE,
verbose = FALSE,
study_area = spk)6.5 Visualising NKDE
Before we can visualise the NKDE values, we have to patch in the computed density values into samples and lixels.
To enhance the readability of the results, we first multiply the obtained densities by the total number of accidents. This adjustment ensures that the spatial integral equals the number of events. Subsequently, we multiply this value by 1000 to derive the estimated number of accidents per kilometer. This process will provide a more intuitive understanding of the density distribution along the network.
# Bangkok
bkk_samples$nkde_simple <- bkk_nkde_simple*nrow(bkk_acc)*1000
bkk_lixels$nkde_simple <- bkk_nkde_simple*nrow(bkk_acc)*1000
bkk_samples$nkde_discont <- bkk_nkde_discon*nrow(bkk_acc)*1000
bkk_lixels$nkde_discont <- bkk_nkde_discon*nrow(bkk_acc)*1000
bkk_samples$nkde_cont <- bkk_nkde_cont*nrow(bkk_acc)*1000
bkk_lixels$nkde_cont <- bkk_nkde_cont*nrow(bkk_acc)*1000# Samut Prakan
spk_samples$nkde_simple <- spk_nkde_simple*nrow(spk_acc)*1000
spk_lixels$nkde_simple <- spk_nkde_simple*nrow(spk_acc)*1000
spk_samples$nkde_discont <- spk_nkde_discon*nrow(spk_acc)*1000
spk_lixels$nkde_discont <- spk_nkde_discon*nrow(spk_acc)*1000
spk_samples$nkde_cont <- spk_nkde_cont*nrow(spk_acc)*1000
spk_lixels$nkde_cont <- spk_nkde_cont*nrow(spk_acc)*1000#| code-fold: true
#| code-summary: "Show the code"
tmap_mode('view')
bkk_nkde_simple_map <- tm_basemap("OpenStreetMap") +
tm_shape(bkk_lixels)+
tm_lines(col="nkde_simple",
lwd = 2,
palette = c("#f9edd1","#feb7c9","#f775a9", "#bb8bdc","#021c9e"),
id="nkde_cont")+
tm_shape(bkk_acc)+
tm_dots(size=0.01)
bkk_nkde_discon_map <- tm_basemap("OpenStreetMap") +
tm_shape(bkk_lixels)+
tm_lines(col="nkde_discont",
lwd = 2,
palette = c("#f9edd1","#feb7c9","#f775a9", "#bb8bdc","#021c9e"),
id="nkde_cont")+
tm_shape(bkk_acc)+
tm_dots(size=0.01)
bkk_nkde_cont_map <- tm_basemap("OpenStreetMap") +
tm_shape(bkk_lixels)+
tm_lines(col="nkde_cont",
lwd = 2,
palette = c("#f9edd1","#feb7c9","#f775a9", "#bb8bdc","#021c9e"),
id="nkde_cont")+
tm_shape(bkk_acc)+
tm_dots(size=0.01)
tmap_arrange(bkk_nkde_simple_map,
bkk_nkde_discon_map,
bkk_nkde_cont_map,
ncol=3,
nrow=1,
sync = TRUE
)#| code-fold: true
#| code-summary: "Show the code"
tmap_mode('view')
spk_nkde_simple_map <- tm_basemap(server = "OpenStreetMap") +
tm_shape(spk_lixels)+
tm_lines(col="nkde_simple",
lwd = 2,
palette = c("#f9edd1","#feb7c9","#f775a9", "#bb8bdc","#021c9e"),
id="nkde_cont")+
tm_shape(spk_acc)+
tm_dots(size=0.01)
spk_nkde_discon_map <- tm_basemap(server = "OpenStreetMap") +
tm_shape(spk_lixels)+
tm_lines(col="nkde_discont",
lwd = 2,
palette = c("#f9edd1","#feb7c9","#f775a9", "#bb8bdc","#021c9e"),
id="nkde_cont")+
tm_shape(spk_acc)+
tm_dots(size=0.01)
spk_nkde_cont_map <- tm_basemap(server = "OpenStreetMap") +
tm_shape(spk_lixels)+
tm_lines(col="nkde_cont",
lwd = 2,
palette = c("#f9edd1","#feb7c9","#f775a9", "#bb8bdc","#021c9e"),
id="nkde_cont")+
tm_shape(spk_acc)+
tm_dots(size=0.01)
tmap_arrange(spk_nkde_simple_map,
spk_nkde_discon_map,
spk_nkde_cont_map,
ncol=3,
nrow=1,
sync = TRUE
)Observations
Although Network Kernel Density Estimation (NKDE) is a useful tool for visualising the impact of events on a road network, I found that the resulting NKDE maps weren’t as insightful as expected. This could be due to the high density of traffic accident points, which might obscure meaningful patterns, or potentially due to errors in the code implementation :(
To gain more detailed insights, I propose extending the analysis by focusing on smaller subsets of events, segmented by temporal patterns. In the next section, I will explore Temporal Network Kernel Density Estimation (TNKDE), which incorporates time-based factors such as the time of day or seasonality. By doing so, we can investigate how traffic accidents vary across different times and conditions, potentially revealing new patterns that NKDE alone may not capture.
7 Temporal Network Kernel Density Estination (TNKDE)
Temporal Network Kernel Density Estimation (TNKDE) is an advanced spatial analysis technique that integrates both spatial and temporal dimensions to provide a more nuanced understanding of event distributions along networks. Unlike traditional NKDE, which focuses solely on spatial data, TNKDE accounts for time-based variations such as time of day, day of the week, or even seasonal trends. This allows for a more detailed analysis of how events, like traffic accidents, fluctuate over both space and time.
The purpose of TNKDE is to capture temporal patterns within spatial networks, offering insights into how event intensities shift at different times. By incorporating time, TNKDE helps uncover spatio-temporal relationships that may be overlooked in a purely spatial analysis. This makes it especially useful for understanding dynamic events, like traffic accidents, where timing plays a critical role. Ultimately, TNKDE supports more informed decision-making for urban planning, road safety improvements, and traffic management by identifying accident hotspots based on specific temporal contexts.
7.1 Visualising Distribution
Recall the variables we derived in Section 3.1. Our TNKDE analysis will be focused on these factors:
flowchart TD
A[Variables] --> A1[Month]
A1 -.-> A11[Jan to Dec]
A --> A2[Day]
A2 -.-> A21[Mon to Sun]
A --> A3[Hour]
A3 -.-> A31[00 to 23 hr]
A --> A4[Songkran?]
A4 -.-> A41[Yes]
A4 -.-> A42[No]
A --> A5[Weektype]
A5 -.-> A51[Weekday]
A5 -.-> A52[Weekend]
A --> A6[Season]
A6 -.-> A61[Summer]
A6 -.-> A62[Rainy]
A6 -.-> A63[Winter]
A --> A7[Peak Period]
A7 -.-> A71[Morning Peak]
A7 -.-> A72[Evening Peak]
A7 -.-> A73[Non-peak]
Let’s visualise the distribution:
Show the code
plot_month <- ggplot(data = bkk_acc,
aes(x = month)) +
geom_bar()+
ylim(0, 800) +
geom_text(stat="count",
aes(label=paste0(after_stat(count), ", ",
round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
vjust=-1,
size=2) +
labs(x = "Month",
y = "No. of Accidents",
title = "Accidents by Month") +
scale_x_continuous(breaks = 1:12) +
theme_minimal() +
theme(
plot.title = element_text(face="bold", size = 8),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
)
plot_day <- ggplot(data = bkk_acc,
aes(x = day)) +
geom_bar()+
ylim(0, 1200) +
geom_text(stat="count",
aes(label=paste0(after_stat(count), ", ",
round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
vjust=-1,
size=2) +
labs(x = "Month",
y = "No. of Accidents",
title = "Accidents by Day of Week") +
scale_x_continuous(breaks = 1:7) +
theme_minimal() +
theme(
plot.title = element_text(face="bold", size = 8),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
)
plot_hour <- ggplot(data = bkk_acc,
aes(x = hour)) +
geom_bar()+
ylim(0, 700) +
geom_text(stat="count",
aes(label=paste0(after_stat(count), ", ",
round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
vjust=-1,
size=2,
angle = 45,
hjust = -0.25) +
labs(x = "Month",
y = "No. of Accidents",
title = "Accidents by Time of Day") +
scale_x_continuous(breaks = 0:23) +
theme_minimal() +
theme(
plot.title = element_text(face="bold", size = 8),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
)
gridExtra::grid.arrange(plot_month, plot_day,
plot_hour,
ncol=1, nrow = 3)
Show the code
plot_songkran <- ggplot(data = bkk_acc,
aes(x = reorder(songkran, songkran, function(x)-length(x)))) +
geom_bar()+
ylim(0, 13500) +
geom_text(stat="count",
aes(label=paste0(after_stat(count), ", ",
round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
vjust=-1,
size=3) +
labs(x = "Songkran?",
y = "No. of Accidents",
title = "Accidents During Songkran") +
theme_minimal() +
theme(
plot.title = element_text(face="bold", size = 8),
axis.title.y = element_text(size = 6),
axis.title.x = element_blank(),
plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
) +
scale_x_discrete(labels=c('No', 'Yes'))
plot_weektype <- ggplot(data = bkk_acc,
aes(x = reorder(weektype, weektype, function(x)-length(x)))) +
geom_bar()+
ylim(0, 6000) +
geom_text(stat="count",
aes(label=paste0(after_stat(count), ", ",
round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
vjust=-1,
size=3) +
labs(x = "Weektype",
y = "No. of Accidents",
title = "Accidents by Weektype") +
theme_minimal() +
theme(
plot.title = element_text(face="bold", size = 8),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
)
plot_season <- ggplot(data = bkk_acc,
aes(x = reorder(season, season, function(x)-length(x)))) +
geom_bar()+
ylim(0, 6000) +
geom_text(stat="count",
aes(label=paste0(after_stat(count), ", ",
round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
vjust=-1,
size=3) +
labs(x = "Season",
y = "No. of Accidents",
title = "Accidents by Season") +
theme_minimal() +
theme(
plot.title = element_text(face="bold", size = 8),
axis.title.y = element_text(size = 6),
axis.title.x = element_blank(),
plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
)
plot_peakperiod <- ggplot(data = bkk_acc,
aes(x = reorder(peakperiod, peakperiod, function(x)-length(x)))) +
geom_bar()+
ylim(0, 6000) +
geom_text(stat="count",
aes(label=paste0(after_stat(count), ", ",
round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
vjust=-1,
size=3) +
labs(x = "Season",
y = "No. of Accidents",
title = "Accidents by Peak/Non-Peak Period") +
theme_minimal() +
theme(
plot.title = element_text(face="bold", size = 8),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
)
gridExtra::grid.arrange(plot_songkran, plot_weektype,
plot_season, plot_peakperiod,
ncol=2, nrow = 2)
Show the code
plot_month_spk <- ggplot(data = spk_acc,
aes(x = month)) +
geom_bar()+
ylim(0, 300) +
geom_text(stat="count",
aes(label=paste0(after_stat(count), ", ",
round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
vjust=-1,
size=2) +
labs(x = "Month",
y = "No. of Accidents",
title = "Accidents by Month") +
scale_x_continuous(breaks = 1:12) +
theme_minimal() +
theme(
plot.title = element_text(face="bold", size = 8),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
)
plot_day_spk <- ggplot(data = spk_acc,
aes(x = day)) +
geom_bar()+
ylim(0, 500) +
geom_text(stat="count",
aes(label=paste0(after_stat(count), ", ",
round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
vjust=-1,
size=2) +
labs(x = "Month",
y = "No. of Accidents",
title = "Accidents by Day of Week") +
scale_x_continuous(breaks = 1:7) +
theme_minimal() +
theme(
plot.title = element_text(face="bold", size = 8),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
)
plot_hour_spk <- ggplot(data = spk_acc,
aes(x = hour)) +
geom_bar()+
ylim(0, 300) +
geom_text(stat="count",
aes(label=paste0(after_stat(count), ", ",
round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
vjust=-1,
size=2,
angle = 45,
hjust = -0.25) +
labs(x = "Month",
y = "No. of Accidents",
title = "Accidents by Time of Day") +
scale_x_continuous(breaks = 0:23) +
theme_minimal() +
theme(
plot.title = element_text(face="bold", size = 8),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
)
gridExtra::grid.arrange(plot_month_spk, plot_day_spk,
plot_hour_spk,
ncol=1, nrow = 3)
Show the code
plot_songkran_spk <- ggplot(data = spk_acc,
aes(x = reorder(songkran, songkran, function(x)-length(x)))) +
geom_bar()+
ylim(0, 3000) +
geom_text(stat="count",
aes(label=paste0(after_stat(count), ", ",
round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
vjust=-1,
size=3) +
labs(x = "Songkran?",
y = "No. of Accidents",
title = "Accidents During Songkran") +
theme_minimal() +
theme(
plot.title = element_text(face="bold", size = 8),
axis.title.y = element_text(size = 6),
axis.title.x = element_blank(),
plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
) +
scale_x_discrete(labels=c('No', 'Yes'))
plot_weektype_spk <- ggplot(data = spk_acc,
aes(x = reorder(weektype, weektype, function(x)-length(x)))) +
geom_bar()+
ylim(0, 2000) +
geom_text(stat="count",
aes(label=paste0(after_stat(count), ", ",
round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
vjust=-1,
size=3) +
labs(x = "Weektype",
y = "No. of Accidents",
title = "Accidents by Weektype") +
theme_minimal() +
theme(
plot.title = element_text(face="bold", size = 8),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
)
plot_season_spk <- ggplot(data = spk_acc,
aes(x = reorder(season, season, function(x)-length(x)))) +
geom_bar()+
ylim(0, 1500) +
geom_text(stat="count",
aes(label=paste0(after_stat(count), ", ",
round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
vjust=-1,
size=3) +
labs(x = "Season",
y = "No. of Accidents",
title = "Accidents by Season") +
theme_minimal() +
theme(
plot.title = element_text(face="bold", size = 8),
axis.title.y = element_text(size = 6),
axis.title.x = element_blank(),
plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
)
plot_peakperiod_spk <- ggplot(data = spk_acc,
aes(x = reorder(peakperiod, peakperiod, function(x)-length(x)))) +
geom_bar()+
ylim(0, 2500) +
geom_text(stat="count",
aes(label=paste0(after_stat(count), ", ",
round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
vjust=-1,
size=3) +
labs(x = "Season",
y = "No. of Accidents",
title = "Accidents by Peak/Non-Peak Period") +
theme_minimal() +
theme(
plot.title = element_text(face="bold", size = 8),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
)
gridExtra::grid.arrange(plot_songkran_spk, plot_weektype_spk,
plot_season_spk, plot_peakperiod_spk,
ncol=2, nrow = 2)
Observations
We observe a significantly higher number of accidents occurring during non-peak periods and on weekdays in both Bangkok and Samut Prakan. Additionally, a notable proportion of accidents take place during the rainy season in these provinces.
Upon closer examination, while the distribution of accidents across different days of the week and months appears fairly even, there is considerable variability in occurrences at different times of day. In Bangkok, the peak times are particularly from 7 AM to 11 AM and around 4 PM and 7 PM. In Samut Prakan, accident occurrences are notably higher between 7 AM and 8 PM compared to other times.
Interestingly, despite many reports highlighting elevated accident rates during the Songkran festival, incidents during this week-long event account for less than 5% of the total accidents recorded in the study period.
7.2 Temporal Dimension
Next, we will compute kernel density estimates of accidents over time using multiple bandwidths with the tnkde() function from the spNetwork package. TNKDE requires two bandwidths: one for spatial data and one for temporal data. In this section, we will explore three different methods for selecting bandwidths:
- bw_bcv: Utilises biased cross-validation to find the bandwidth that minimises estimation errors.
- bw_ucv: Implements unbiased cross-validation for bandwidth selection.
- bw_SJ: Estimates bandwidths using pilot estimation of derivatives.
Steps:
- We will start by creating a vector called weight that will have the same number of elements as rows in the bkk_acc dataset, which will be used as the weighting factor in the TNKDE calculations.
- After that, we’ll generate a sequence of numbers ranging from 0 to the maximum value in the hour column of the bkk_acc data frame, with intervals of 0.5. These points will serve as the sample locations for estimating the density.
- Next, we’ll calculate the bandwidth values for the hour column using the three bandwidth selection methods mentioned earlier.
- Once the bandwidth values are obtained, we will perform the TNKDE analysis using the
tkde()function from the spPackage and then create a data frame called tk to store the resulting density estimates. - Finally, we will apply the melt function from the reshape2 package to reshape the tk data frame into a new data frame named df_hour, using the hour variable as the identifier.
## Bangkok
weight_bkk <- rep(1, nrow(bkk_acc))
samples_bkk <- seq(0, max(bkk_acc$hour), 0.5)
## Sanut Prakan
weight_spk <- rep(1, nrow(spk_acc))
samples_spk <- seq(0, max(spk_acc$hour), 0.5)## Bangkok
tk_bkk <- data.frame(
bw_bcv = tkde(bkk_acc$hour, w = weight_bkk, samples = samples_bkk, bw = bw_bcv_bkk, kernel_name = "quartic"),
bw_ucv = tkde(bkk_acc$hour, w = weight_bkk, samples = samples_bkk, bw = bw_ucv_bkk, kernel_name = "quartic"),
bw_SJ = tkde(bkk_acc$hour, w = weight_bkk, samples = samples_bkk, bw = bw_SJ_bkk, kernel_name = "quartic"),
hour = samples_bkk
)
df_hour_bkk <- reshape2::melt(tk_bkk,
id.vars = "hour")
## Sanut Prakan
tk_spk <- data.frame(
bw_bcv = tkde(spk_acc$hour, w = weight_spk, samples = samples_spk, bw = bw_bcv_spk, kernel_name = "quartic"),
bw_ucv = tkde(spk_acc$hour, w = weight_spk, samples = samples_spk, bw = bw_ucv_spk, kernel_name = "quartic"),
bw_SJ = tkde(spk_acc$hour, w = weight_spk, samples = samples_spk, bw = bw_SJ_spk, kernel_name = "quartic"),
hour = samples_spk
)
df_hour_spk <- reshape2::melt(tk_spk,
id.vars = "hour")Show the code
plot_tk_bkk <- ggplot(data = df_hour_bkk,
aes(x = hour, y = value)) +
geom_line()+
theme_minimal() +
theme(
plot.title = element_text(face="bold", size = 8),
axis.title = element_text(size = 8),
axis.text = element_text(size = 6),
plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9"),
strip.background=element_rect(fill="#f4e9e8")
) +
facet_wrap(vars(variable), ncol=1, nrow=3, scales = "free") +
scale_x_continuous(breaks = 0:23)
plot_tk_bkk
Show the code
plot_tk_spk <- ggplot(data = df_hour_spk,
aes(x = hour, y = value)) +
geom_line()+
theme_minimal() +
theme(
plot.title = element_text(face="bold", size = 8),
axis.title = element_text(size = 8),
axis.text = element_text(size = 6),
plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9"),
strip.background=element_rect(fill="#f4e9e8")
) +
facet_wrap(vars(variable), ncol=1, nrow=3, scales = "free") +
scale_x_continuous(breaks = 0:23)
plot_tk_spk
Observations
Both bw_ucv and bw_SJ display a wider range of values and more fluctuations. In contrast, bw_bcv shows a smoother pattern with less variability in its range. The differences in these plots highlight how the choice of bandwidth can significantly affect the resulting density estimate.
These differences in the plots emphasise how the choice of bandwidth can have a substantial impact on the resulting density estimates.
7.3 Optimal Bandwidths
For effective implementation of TNKDE, it’s essential to consider both spatial and temporal dimensions. To achieve this, we can use the leave-one-out cross-validation method to determine appropriate bandwidths for both aspects. The bw_tnkde_cv_likelihood_calc() function from spNetwork can be employed to evaluate the cross-validation likelihood for various bandwidths, allowing us to select the most fitting options for both the network and time dimensions based on data-driven insights.
In the code chunk below, the following arguments are used: - bws_net: An ordered numeric vector with all the network bandwidths consisting of range (starting, ending value), and step size. - bws_time: An ordered numeric vector with all the time bandwidths with range (starting, ending value), and step size. - lines: linestrings representing the underlying network - events: points representing the events on the network i.e. our points of accidents - timefield: this is the hour column derived in our earlier calculation - agg, max_depth, grid_shape are kept consistent from our NKDE calculations.
cv_bkk <- bw_tnkde_cv_likelihood_calc(
bws_net = seq(100,1000,100),
bws_time = seq(10,60,5),
lines = bkk_network,
events = bkk_acc,
time_field = "hour",
w = rep(1, nrow(bkk_acc)),
kernel_name = "quartic",
method = "discontinuous",
diggle_correction = FALSE,
study_area = NULL,
max_depth = 8,
digits = 1,
tol = 0.1,
agg = 5,
sparse=TRUE,
grid_shape=c(10,10),
sub_sample=1,
verbose = FALSE,
check = TRUE)
write_rds(cv_bkk, "data/rds/cv_bkk.rds")To visualise the output:
| 10 | 15 | 20 | 25 | 30 | 35 | 40 | 45 | 50 | 55 | 60 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 100 | -181.34325 | -150.86581 | -139.77335 | -136.38724 | -136.61409 | -136.82656 | -137.02051 | -137.19712 | -137.35844 | -137.50654 | -137.64319 |
| 200 | -115.58330 | -96.21737 | -88.81100 | -87.26863 | -87.52025 | -87.75267 | -87.96406 | -88.15628 | -88.33176 | -88.49280 | -88.64138 |
| 300 | -88.22241 | -72.98540 | -68.66435 | -66.12972 | -66.39108 | -66.63200 | -66.85092 | -67.04990 | -67.23152 | -67.39818 | -67.55193 |
| 400 | -70.82133 | -60.13231 | -57.07147 | -55.44705 | -55.71491 | -55.96074 | -56.18380 | -56.38641 | -56.57128 | -56.74089 | -56.89733 |
| 500 | -61.29754 | -52.99925 | -50.85048 | -49.45982 | -49.73170 | -49.98047 | -50.20595 | -50.41066 | -50.59740 | -50.76870 | -50.92669 |
| 600 | -54.68852 | -48.21310 | -46.51937 | -45.35830 | -45.63296 | -45.88376 | -46.11092 | -46.31710 | -46.50514 | -46.67761 | -46.83667 |
| 700 | -49.86932 | -43.63428 | -42.73554 | -41.69092 | -41.96761 | -42.21998 | -42.44851 | -42.65589 | -42.84502 | -43.01849 | -43.17847 |
| 800 | -47.28646 | -40.84318 | -39.83741 | -38.90791 | -39.18626 | -39.43990 | -39.66949 | -39.87782 | -40.06781 | -40.24205 | -40.40275 |
| 900 | -44.47935 | -39.27202 | -38.49220 | -37.45284 | -37.73203 | -37.98631 | -38.21645 | -38.42526 | -38.61568 | -38.79033 | -38.95140 |
| 1000 | -42.55712 | -37.69901 | -37.14587 | -36.22036 | -36.50042 | -36.75529 | -36.98592 | -37.19517 | -37.38598 | -37.56098 | -37.72237 |
cv_spk <- bw_tnkde_cv_likelihood_calc(
bws_net = seq(100,1000,100),
bws_time = seq(10,60,5),
lines = spk_network,
events = spk_acc,
time_field = "hour",
w = rep(1, nrow(spk_acc)),
kernel_name = "quartic",
method = "discontinuous",
diggle_correction = FALSE,
study_area = NULL,
max_depth = 8,
digits = 1,
tol = 0.1,
agg = 5,
sparse=TRUE,
grid_shape=c(10,10),
sub_sample=1,
verbose = FALSE,
check = TRUE)
write_rds(cv_spk, "data/rds/cv_spk.rds")To visualise the output:
| 10 | 15 | 20 | 25 | 30 | 35 | 40 | 45 | 50 | 55 | 60 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 100 | -237.91197 | -210.65241 | -197.30990 | -194.75027 | -194.95486 | -195.14583 | -195.32013 | -195.47882 | -195.62376 | -195.75681 | -195.87957 |
| 200 | -183.65331 | -156.26772 | -145.46229 | -142.01147 | -142.23908 | -142.45049 | -142.64311 | -142.81837 | -142.97841 | -143.12530 | -143.26082 |
| 300 | -150.72171 | -124.94252 | -115.39506 | -113.80690 | -114.04852 | -114.27148 | -114.47425 | -114.65859 | -114.82686 | -114.98127 | -115.12371 |
| 400 | -123.28002 | -100.61008 | -93.84198 | -92.57890 | -92.83060 | -93.06205 | -93.27233 | -93.46342 | -93.63783 | -93.79786 | -93.94547 |
| 500 | -107.83014 | -87.33140 | -80.58086 | -79.63046 | -79.88865 | -80.12542 | -80.34035 | -80.53562 | -80.71380 | -80.87728 | -81.02808 |
| 600 | -92.92595 | -73.98646 | -67.55693 | -67.23058 | -67.49554 | -67.73765 | -67.95717 | -68.15651 | -68.33837 | -68.50520 | -68.65908 |
| 700 | -83.85252 | -68.59219 | -62.78504 | -62.46360 | -62.73159 | -62.97590 | -63.19726 | -63.39821 | -63.58151 | -63.74965 | -63.90472 |
| 800 | -78.14438 | -64.74374 | -59.55715 | -59.23880 | -59.50865 | -59.75436 | -59.97690 | -60.17890 | -60.36315 | -60.53215 | -60.68802 |
| 900 | -75.49196 | -62.40967 | -57.22976 | -56.91380 | -57.18506 | -57.43182 | -57.65525 | -57.85803 | -58.04299 | -58.21263 | -58.36909 |
| 1000 | -67.95707 | -56.42548 | -50.64316 | -50.02889 | -50.30212 | -50.55113 | -50.77675 | -50.98157 | -51.16843 | -51.33984 | -51.49794 |
Observations
The optimal set of bandwidths that gives the least cv score is 1000 metres and 25 hrs.
7.4 Calculating TNKDE
Using the optimal set of spatial and temporal bandwidths, we can perform TNKDE calculation.
We first create a sequence of hourly time points from our dataset. min() retrieves the minimum value while max() retrieves the maximum value from the hour column. seq() generates a sequence of numbers from the minimum hour and ending at the maximum hour, with an increment of 1 (i.e. giving us hourly intervals). This sequence will be used to visualise accident data at different times of the day.
bkk_sample_time <- seq(min(bkk_acc$hour), max(bkk_acc$hour), 1)bkk_tnkde_densities <- tnkde(lines = bkk_network,
events = bkk_acc,
time_field = "hour",
w = rep(1, nrow(bkk_acc)),
samples_loc = bkk_samples,
samples_time = bkk_sample_time,
kernel_name = "quartic",
bw_net = 1000, bw_time = 25,
adaptive = TRUE,
trim_bw_net = 900,
trim_bw_time = 8,
method = "discontinuous",
div = "bw", max_depth = 8,
digits = 1, tol = 0.01,
agg = 5, grid_shape = c(10,10),
verbose = FALSE)spk_sample_time <- seq(min(spk_acc$hour), max(spk_acc$hour), 1)spk_tnkde_densities <- tnkde(lines = spk_network,
events = spk_acc,
time_field = "hour",
w = rep(1, nrow(spk_acc)),
samples_loc = spk_samples,
samples_time = spk_sample_time,
kernel_name = "quartic",
bw_net = 1000, bw_time = 25,
adaptive = TRUE,
trim_bw_net = 900,
trim_bw_time = 8,
method = "discontinuous",
div = "bw", max_depth = 8,
digits = 1, tol = 0.01,
agg = 5, grid_shape = c(10,10),
verbose = FALSE)7.5 Visualising TNKDE
7.5.1 Bangkok
Show the code
# Create a color palette for all the densities
all_densities_bkk <- c(bkk_tnkde_densities$k)
color_breaks_bkk <- classInt::classIntervals(all_densities_bkk,
n = 10,
style = "kmeans")
# Generate a map at each sample time
maps_bkk <- lapply(1:ncol(bkk_tnkde_densities$k), function(i){
time <- bkk_sample_time[[i]]
bkk_samples$tnkde_density <- bkk_tnkde_densities$k[,i]
map1 <- tm_shape(bkk_samples) +
tm_dots(col = "tnkde_density", size = 0.01,
breaks = color_breaks_bkk$brks,
palette = viridis::viridis(10)) +
tm_layout(legend.show=FALSE,
main.title = paste("TNKDE of Accidents in Bangkok - ", as.character(time), ":00"),
main.title.size = 0.5,
bg.color = "#E4D5C9",
frame = F)
return(map1)
})
# Create a gif with all the maps
tmap_animation(maps_bkk, filename = "images/tnkde_bkk.gif",
width = 1000, height = 1000, dpi = 300, delay = 50)
Although hotspot locations shift on an hourly basis, the hourly maps visually smooth out these changes between hours. In the animation below, we applied the same process but with a 3-hour interval, allowing for clearer visualisation of hotspot variations over time.

7.5.2 Samut Prakan
Show the code
# Create a color palette for all the densities
all_densities_spk <- c(spk_tnkde_densities$k)
color_breaks_spk <- classInt::classIntervals(all_densities_spk, n = 10, style = "kmeans")
# Generate a map at each sample time
maps_spk <- lapply(1:ncol(spk_tnkde_densities$k), function(i){
time <- spk_sample_time[[i]]
spk_samples$tnkde_density <- spk_tnkde_densities$k[,i]
map1 <- tm_shape(spk_samples) +
tm_dots(col = "tnkde_density", size = 0.01,
breaks = color_breaks_spk$brks, palette = viridis::viridis(10)) +
tm_layout(legend.show=FALSE,
main.title = paste("TNKDE of Accidents in Samut Prakan - ", as.character(time), ":00"),
main.title.size = 0.5,
bg.color = "#E4D5C9",
frame = F)
return(map1)
})
# Create a gif with all the maps
tmap_animation(maps_spk, filename = "images/tnkde_spk.gif",
width = 1000, height = 1000, dpi = 300, delay = 50)
Using 3-hour intervals:

Observations
In Section 7.1, we identified specific time windows—7am to 11am, 4pm, and 7pm—as periods with higher accident occurrences in Bangkok province. The TNKDE map for Bangkok aligns with these observations, revealing more prominent accident hotspots (highlighted by brighter colors) as the day progresses, while fewer hotspots are seen from late night through early morning. Notably, the areas with consistent hotspots throughout the day appear to be concentrated along major highways, which are likely key transport routes in Bangkok, contributing to the elevated accident rates. Fewer accidents are observed outside these primary roads.
Similarly, in the previous section, we identified 7am to 8pm as the period with the highest frequency of accidents in Samut Prakan. This observation is supported by the TNKDE map, which shows that accident hotspots remain fairly consistent throughout the day and significantly diminish in the early morning hours. Like Bangkok, the accident clusters in Samut Prakan tend to occur in the same areas throughout the day, suggesting these are high-traffic routes or major roads.
The TNKDE maps provide more granular insights into how accident clusters evolve over time, offering a clearer picture of how accidents are spatially concentrated across different times of day. This enhanced level of detail aids in understanding how accident hotspots shift temporally and spatially within the network. Such insights can be instrumental for traffic safety planning, as they allow for the identification of critical accident-prone areas and trends over time. Overall, the TNKDE approach delivers a robust analysis of both spatial and temporal clustering, offering a deeper and more comprehensive understanding of the patterns that drive accident occurrences across the network.
8 Conclusion
This study examined different methods of spatial point pattern analysis using road accident data from Thailand. We utilised three distinct approaches: Traditional Kernel Density Estimation (KDE), Network-Constrained Kernel Density Estimation (NKDE), and Temporal Network Kernel Density Estimation (TNKDE). These techniques helped uncover both spatial and temporal trends in accident clustering within the Bangkok Metropolitan Region (BMR).
We experimented with various bandwidth selection techniques and kernel functions, comparing the outcomes for each combination. We produced three KDE maps using different strategies: fixed bandwidth, adaptive nearest neighbor, and adaptive kernel density. These maps pinpointed potential accident hotspots in two provinces within the BMR, namely Bangkok and Samut Prakan, where accidents showed significant spatial clustering.
A key part of our analysis involved comparing the performance of traditional, pixel-based KDE with network-based NKDE. Traditional KDE was limited in its ability to capture the spatial distribution of accidents along road networks due to its reliance on Euclidean distance. NKDE, on the other hand, offered a more precise and accurate representation by accounting for the constraints of road networks. This approach led to a more nuanced understanding of accident hotspots, providing clearer and more reliable results than the traditional method.
The introduction of TNKDE added further depth to the analysis by incorporating time as a factor, allowing us to observe how accident clusters vary over different time intervals. This provided a more dynamic perspective on the spatial distribution of accidents.
Looking ahead, future research could incorporate additional data from the original Thailand Road Accidents dataset, such as weather conditions, road type, and traffic characteristics, to provide a more holistic view of the factors contributing to road accidents. Integrating these variables could enhance the analysis and offer a richer understanding of accident patterns.
9 References
Gelb, J. (2021). spNetwork: A Package for Network Kernel Density Estimation. The R Journal, 13(2). https://journal.r-project.org/archive/2021/RJ-2021-102/RJ-2021-102.pdf
Gelb, J. (2024). Temporal Network Kernel Density Estimate. spNetwork. https://jeremygelb.github.io/spNetwork/articles/TNKDE.html#spatio-temporal
Gelb, J. (2024). Network Kernel Density Estimate. spNetwork. https://jeremygelb.github.io/spNetwork/articles/NKDE.html
Kam, T.S. (2023). 1st Order Spatial Point Patterns Analysis Methods. R for Geospatial Data Science and Analytics. https://r4gdsa.netlify.app/chap04
Kam, T.S. (2023). 2nd Order Spatial Point Patterns Analysis Methods. R for Geospatial Data Science and Analytics. https://r4gdsa.netlify.app/chap05
Kam, T.S. (2023). Spatio-Temporal Point Patterns Analysis. R for Geospatial Data Science and Analytics. https://r4gdsa.netlify.app/chap06
Kam, T.S. (2023). Network Constrained Spatial Point Patterns Analysis. R for Geospatial Data Science and Analytics. https://r4gdsa.netlify.app/chap07